Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

نویسندگان

  • Andrew C. Trapp
  • Chao Li
  • Patrick Flaherty
چکیده

Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (2003) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is N P-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms that iteratively solve mathematical programming formulations to global optimality to recover, for any given level of significance, all GOPSMs from a given data matrix. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments. Andrew C. Trapp Foisie School of Business, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, MA 01609, USA Tel.: +1-508-831-4935 E-mail: [email protected] Chao Li Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, MA 01609, USA Patrick Flaherty Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA 01003, USA 2 Andrew C. Trapp et al.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences

Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long pa...

متن کامل

Towards Scalable Algorithms for Discovering Rough Set Reducts

Rough set theory allows one to find reducts from a decision table, which are minimal sets of attributes preserving the required quality of classification. In this article, we propose a number of algorithms for discovering all generalized reducts (preserving generalized decisions), all possible reducts (preserving upper approximations) and certain reducts (preserving lower approximations). The n...

متن کامل

Filtration Algorithms for Approximate Order-Preserving Matching

The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P . Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same p...

متن کامل

Extending the Order Preserving Submatrix: New patterns in datasets

This paper concerns in finding local patterns in gene expression datasets. We present new order relation patterns, and develop algorithms which finds those pattern. Our algorithms are the first algorithms to find the exact results for those patterns, yet in most cases they outperforms existing heuristical algorithm. Finally we present an algorithm for the broader problem of frequent itemset min...

متن کامل

Heuristic and exact algorithms for Generalized Bin Covering Problem

In this paper, we study the Generalized Bin Covering problem. For this problem an exact algorithm is introduced which can nd optimal solution for small scale instances. To nd a solution near optimal for large scale instances, a heuristic algorithm has been proposed. By computational experiments, the eciency of the heuristic algorithm is assessed.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Annals OR

دوره 263  شماره 

صفحات  -

تاریخ انتشار 2018